39 research outputs found
Thompson Sampling for a Fatigue-aware Online Recommendation System
In this paper we consider an online recommendation setting, where a platform
recommends a sequence of items to its users at every time period. The users
respond by selecting one of the items recommended or abandon the platform due
to fatigue from seeing less useful items. Assuming a parametric stochastic
model of user behavior, which captures positional effects of these items as
well as the abandoning behavior of users, the platform's goal is to recommend
sequences of items that are competitive to the single best sequence of items in
hindsight, without knowing the true user model a priori. Naively applying a
stochastic bandit algorithm in this setting leads to an exponential dependence
on the number of items. We propose a new Thompson sampling based algorithm with
expected regret that is polynomial in the number of items in this combinatorial
setting, and performs extremely well in practice
An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions
We propose a new efficient online algorithm to learn the parameters governing
the purchasing behavior of a utility maximizing buyer, who responds to prices,
in a repeated interaction setting. The key feature of our algorithm is that it
can learn even non-linear buyer utility while working with arbitrary price
constraints that the seller may impose. This overcomes a major shortcoming of
previous approaches, which use unrealistic prices to learn these parameters
making them unsuitable in practice
Block-Structure Based Time-Series Models For Graph Sequences
Although the computational and statistical trade-off for modeling single
graphs, for instance, using block models is relatively well understood,
extending such results to sequences of graphs has proven to be difficult. In
this work, we take a step in this direction by proposing two models for graph
sequences that capture: (a) link persistence between nodes across time, and (b)
community persistence of each node across time. In the first model, we assume
that the latent community of each node does not change over time, and in the
second model we relax this assumption suitably. For both of these proposed
models, we provide statistically and computationally efficient inference
algorithms, whose unique feature is that they leverage community detection
methods that work on single graphs. We also provide experimental results
validating the suitability of our models and methods on synthetic and real
instances.Comment: 40 pages, 10 figure
Optimizing Revenue over Data-driven Assortments
We revisit the problem of large-scale assortment optimization under the
multinomial logit choice model without any assumptions on the structure of the
feasible assortments. Scalable real-time assortment optimization has become
essential in e-commerce operations due to the need for personalization and the
availability of a large variety of items. While this can be done when there are
simplistic assortment choices to be made, not imposing any constraints on the
collection of feasible assortments gives more flexibility to incorporate
insights of store-managers and historically well-performing assortments. We
design fast and flexible algorithms based on variations of binary search that
find the revenue of the (approximately) optimal assortment. We speed up the
comparisons steps using novel vector space embeddings, based on advances in the
information retrieval literature. For an arbitrary collection of assortments,
our algorithms can find a solution in time that is sub-linear in the number of
assortments and for the simpler case of cardinality constraints - linear in the
number of items (existing methods are quadratic or worse). Empirical
validations using the Billion Prices dataset and several retail transaction
datasets show that our algorithms are competitive even when the number of items
is (x larger instances than previously studied).Comment: 28 pages, 4 figure
Faster Reinforcement Learning Using Active Simulators
In this work, we propose several online methods to build a \emph{learning
curriculum} from a given set of target-task-specific training tasks in order to
speed up reinforcement learning (RL). These methods can decrease the total
training time needed by an RL agent compared to training on the target task
from scratch. Unlike traditional transfer learning, we consider creating a
sequence from several training tasks in order to provide the most benefit in
terms of reducing the total time to train.
Our methods utilize the learning trajectory of the agent on the curriculum
tasks seen so far to decide which tasks to train on next. An attractive feature
of our methods is that they are weakly coupled to the choice of the RL
algorithm as well as the transfer learning method. Further, when there is
domain information available, our methods can incorporate such knowledge to
further speed up the learning. We experimentally show that these methods can be
used to obtain suitable learning curricula that speed up the overall training
time on two different domains.Comment: 12 pages and 4 figures More experiments added to the previous versio
Symmetry Learning for Function Approximation in Reinforcement Learning
In this paper we explore methods to exploit symmetries for ensuring sample
efficiency in reinforcement learning (RL), this problem deserves ever
increasing attention with the recent advances in the use of deep networks for
complex RL tasks which require large amount of training data. We introduce a
novel method to detect symmetries using reward trails observed during episodic
experience and prove its completeness. We also provide a framework to
incorporate the discovered symmetries for functional approximation. Finally we
show that the use of potential based reward shaping is especially effective for
our symmetry exploitation mechanism. Experiments on various classical problems
show that our method improves the learning performance significantly by
utilizing symmetry information.Comment: 12 pages, 3 figures. A preliminary version appears in AAMAS 2017.
Also presented at the 3rd Multidisciplinary Conference on Reinforcement
Learning and Decision Makin
Generalization Bounds for Learning with Linear, Polygonal, Quadratic and Conic Side Knowledge
In this paper, we consider a supervised learning setting where side knowledge
is provided about the labels of unlabeled examples. The side knowledge has the
effect of reducing the hypothesis space, leading to tighter generalization
bounds, and thus possibly better generalization. We consider several types of
side knowledge, the first leading to linear and polygonal constraints on the
hypothesis space, the second leading to quadratic constraints, and the last
leading to conic constraints. We show how different types of domain knowledge
can lead directly to these kinds of side knowledge. We prove bounds on
complexity measures of the hypothesis space for quadratic and conic side
knowledge, and show that these bounds are tight in a specific sense for the
quadratic case.Comment: 37 pages, 3 figures, a shorter version appeared in ISAIM 2014 (new
additions include a reference change and a new figure
The Costs and Benefits of Sharing: Sequential Individual Rationality and Sequential Fairness
In designing dynamic shared service systems that incentivize customers to opt
for shared rather than exclusive service, the traditional notion of individual
rationality may be insufficient, as a customer's estimated utility could
fluctuate arbitrarily during their time in the shared system, as long as their
realized utility at service completion is not worse than that for exclusive
service. In this work, within a model that explicitly considers the
"inconvenience costs" incurred by customers due to sharing, we introduce the
notion of sequential individual rationality (SIR) that requires that the
disutility of existing customers is nonincreasing as the system state changes
due to new customer arrivals. Next, under SIR, we observe that cost sharing can
also be viewed as benefit sharing, which inspires a natural definition of
sequential fairness (SF) - the total incremental benefit due to a new customer
is shared among existing customers in proportion to the incremental
inconvenience suffered.
We demonstrate the effectiveness of these notions by applying them to a
ridesharing system, where unexpected detours to pick up subsequent passengers
inconvenience the existing passengers. Imposing SIR and SF reveals interesting
and surprising results, including: (a) natural limits on the incremental
detours permissible, (b) exact characterization of "SIR-feasible" routes, which
boast sublinear upper and lower bounds on the fractional detours, (c) exact
characterization of sequentially fair cost sharing schemes, which includes a
strong requirement that passengers must compensate each other for the detour
inconveniences that they cause, and (d) new algorithmic problems related to and
motivated by SIR.Comment: Presented as a poster at EC 2016. Presented as an invited talk
(sponsored session) at INFORMS Annual Meeting 2016. Presented at MSOM Service
Operations SIG 2017. Currently under review at Management Scienc
Learning to Partition using Score Based Compatibilities
We study the problem of learning to partition users into groups, where one
must learn the compatibilities between the users to achieve optimal groupings.
We define four natural objectives that optimize for average and worst case
compatibilities and propose new algorithms for adaptively learning optimal
groupings. When we do not impose any structure on the compatibilities, we show
that the group formation objectives considered are hard to solve and we
either give approximation guarantees or prove inapproximability results. We
then introduce an elegant structure, namely that of \textit{intrinsic scores},
that makes many of these problems polynomial time solvable. We explicitly
characterize the optimal groupings under this structure and show that the
optimal solutions are related to \emph{homophilous} and \emph{heterophilous}
partitions, well-studied in the psychology literature. For one of the four
objectives, we show hardness under the score structure and give a
approximation algorithm for which no constant approximation was
known thus far. Finally, under the score structure, we propose an online low
sample complexity PAC algorithm for learning the optimal partition. We
demonstrate the efficacy of the proposed algorithm on synthetic and real world
datasets.Comment: Appears in the Proceedings of the 16th International Conference on
Autonomous Agents and Multiagent Systems (AAMAS 2017
Privacy-preserving Targeted Advertising
Recommendation systems form the center piece of a rapidly growing trillion
dollar online advertisement industry. Even with numerous optimizations and
approximations, collaborative filtering (CF) based approaches require real-time
computations involving very large vectors. Curating and storing such related
profile information vectors on web portals seriously breaches the user's
privacy. Modifying such systems to achieve private recommendations further
requires communication of long encrypted vectors, making the whole process
inefficient. We present a more efficient recommendation system alternative, in
which user profiles are maintained entirely on their device, and appropriate
recommendations are fetched from web portals in an efficient privacy preserving
manner. We base this approach on association rules.Comment: A preliminary version was presented at the 11th INFORMS Workshop on
Data Mining and Decision Analytics (2016